Safety
Overview
The safety policy can be used to detect user-provided inputs or model responses that contain unsafe content, such as obscene, toxic, or harmful content.
Input Safety Policy Definition
Dynamo AI uses the following policy definition for detecting unsafe content.
- Description: Safeguard chat bot from queries with unsafe content, context, or intent.
- Disallowed:
- Illegal or criminal activities
- Violent, harassment, or hurtful topics
- Hateful, derogatory, or discriminative speech
- Dangerous, unethical, or harmful topics
- Encourage or provide methods of self-harm or suicide
- Abusive, biased, or unfair remarks
- Sexual references including but not limited to sexual innuendos, acts, devices, and favors
- Sexist or racist discourse including implicit and explicit discrimination or stereotyping
- Untruthful, manipulative, or misleading statements
- Create, obtain, or operate weapons
- Procurement or use of harmful substances
- Vulgar or offensive language
- Child harm topics
- Malware topics
- Privacy violation topics
- Allowed:
- General harmless queries
- Requests for responsible information on violence and discrimination
- Requests for responsible sexual education, health, or consent
- Requests for factual resources for mental health
- Queries on resources for managing conflicts and reporting harassment
- Queries on promoting diversity, fairness, and inclusion
- Queries on responsible, harmless, and safe information on substances"
- Asks for explanations on ethical and responsible behavior
Output Safety Policy Definition
Coming soon.
Safety Policy Actions
You can manage what happens to inputs and outputs when applying the safety policy using the actions below:
- Flag: flag content for moderator review
- Block: block user inputs or model outputs containing unsafe content